******************************************* * How to make raster effects on the NES ? * * in a game programmer's viewpoint * * by Bregalad * ******************************************* History : May 21st, 2015 - Cleaned up a lot the wording to make the document sound more professional. No more usage of the 2nd person. February 17th, 2009 - Added a chapter about precise scanline operation - Added a chapter about switching nametables midframe (what have I been missing) - Some other minor changes January 21st, 2009 original release ------------ Introduction ------------ The graphics possibilities of the NES gaming console are severely limited, but it is possible to greatly enhance them with mid-frame effects, that is changing the status of some graphics registers while a frame is rendered. Televisions standard have a implementation where the pixels are rendered like text, from left to right then top to bottom, it's possible to change parameters in the middle of the screen so that the area above the split have some parameters and the area below some other parameters. Understanding this may not be always be easy but it crucial in getting good effects for games and get rid of some limitations the system would normally imposes to the programmer. While many documents are already available on the net to explain such behaviour, they are scattered all over the place and sometimes use wordings or explainations that could be very though to understand, especially for NES programmers, as they are almost always written in the emulator author's viewpoint, instead of the NES programmer's viewpoint. I consider knowing at what exact clock cycle a latch inside the NES change it's state is not meaningful for NES programming, but knowing how to put such a behaviour to good use is. It is assumed that the reader already knowns the basics of the NES architecture, PPU registers, tiles, nametable and a bit about scrolling. If you don't know what I'm talking about you should first check other documents that covers basic graphic rendering for the NES. It is not necessary to read the whole document to understand the concept. You can go straight to the end if the middle does not interest you. This document ended up much longer than what I expected to write at first, but since I wrote it I don't want to just delete it. ------------ Content list ------------ - About scanlines - List of possible mid-frame effects with registers * $2000 * $2001 * $2002 * $2003/4 * $2005 * $2006/7 - Bankswitching CHR-ROM midframe - Practical use - Synchronizing with the PPU - Writing timed code - Warning about some evil instructions ------------------ About frame timing ------------------ I will not go into details here, because details are rarely relevant when doing a game or tech-demo. I reordered the info so that a simple guy wanting to write a game can figure what's going on easily without spending too much time for useless details. - The NES outputs 240 lines of 256 pixels to the television. Pixels are rendered left to right, and then up to down, similar to text in a book's page. - After pixel 256 of each line, it takes 85 more "shadow pixels" to go to the next line, while no pixels are rendered. This is called the HBlank. - After line 240, it takes 20 (NTSC) or 70 (PAL) more "shadow lines" to go to the first line again, while no pixels are rendered. This is called the VBlank. - The whole process is repeated at the rate of 60 (NTSC) or 50 (PAL) times per second. - The term "frame" ambigiously refers a while image, and the time spent outside VBlank (in lines that are displayed) - During the VBlank period only is it possible to access VRAM via registers $2006 and $2007. The rest of the time the PPU, the graphic chip, will read the VRAM automatically to render graphics. Using $2007 during that time will cause the chip to make erroneous operation. Sprite RAM shall also never be accessed through $2004 and $4014. A NMI interrupt can be configured to be triggered at the start of that period. A typical program will write potential updates to a buffer, and write the buffer to the actual VRAM in the NMI interrupt routine. - Any write to $2006 overwrites the scrolling values. Before the end of the VBlank period, but after all VRAM updates are done, the scrolling should be updated using registers $2005 and $2000 for the frame that is to come. - A forced blanking mode is available in order to access $2006/$2007 any time, but no image is rendered. This is useful when completely rewriting the screen for example. ------------------------- List of possible effects ------------------------- A "raster effect" is any graphical effect that requires writing to PPU registers during the visible scanlines (outside of VBlank). The point in the video rendering where the state is changed is refereed to as "split point". It's possible to have multiple split points. The PPU has 8 registers, ranging from $2000 to $2007. I will mention what is possible for each register. -------------------------------------- Mid-frame effects possible with $2000 -------------------------------------- - Changing nametable - Bits $2000.0 and $2000.1 are the 9th scrolling bits for H and V respectively, so the details will be in the paragraph about scrolling. By changing the bit $2000.0, it is possible to change the nametable displayed at any time. Interesting effects can be obtained that way. For example if both nametables show similar data, but there is a difference somewhere, some interesting effects can be done the place where the graphics differs, without altering the rest of the screen. It is also possible to have the same tile data in both tables, but with different attributes, and this enables fun effects with the colour of objects on the screen, or to simulate transparency effects. However, changing $2000.1 (the vertical 9th scrolling bit) has no direct effect during the frame. The kind of effects that was just described requires vertical mirroring. - Changing patterns - Bits $2000.3 and $2000.4 controls which pattern tables are used for playfield and sprites. It is also possible to change them during the frame, so that the patterns used to draw the graphics changes from one part of the screen to another. This can also lead to many effects. For example if the patterns in both tables are close but with a different luminosity, transparency effects could be simulated. But the main purpose of this feature is to bypass the 256-tiles limit for either playfield or sprites, at the expense of the other. - Changing sprite mode - It is also possible to change bit $2000.5, so that the sprite size (8x8 or 8x16) is changed in the middle of the screen. This is not really recommended, as it may have strange effect in the region where it is switched. In addition to this, really awesome effects cannot be gotten that way as far as I know. An idea would be to have 8x8 sprites for a status bar and 8x16 for the gameplay, or vice-versa, and this could be useful. If that should be done, be sure to test it extensively on the real hardware. - Disabling interrupts - Not really an effect, but by changing bit $2000.7, VBlank NMI interrupts can be enabled and disabled. This can be done anytime if the code must not be interrupted, as the setting the I flag in the processor status only prevents IRQs and does not prevent VBlank NMIs from happening. If there is a transition from disabled interrupts to enabled interrupts during VBlank, an interrupt will start right after the write, and for consequences the timing will not be the same as usual. It may be possible that the VRAM writes spills inside the frame time accidentally that way. A good practice is to read $2002 before enabling interrupts back, to be sure that no interrupt will fire straight away: It is better to skip a frame entirely than to have it display glitches on the screen. All other bits of $2000 are not useful to change mid-frame as far I know. ------------------------------------- Mid-frame effects possible with $2001 ------------------------------------- - Changing colour emphasis - Bit 0, 5, 6 and 7 of $2001 allows to use grey-scale mode and/or colour emphasis. Those changes takes effect immediately, even in the middle of a line. Not only can some graphic effects be done that way, but the most useful is to see graphically how much time a routine takes for debugging: The grey-scale bit can be set while the CPU is busy, and cleared afterwards. By seeing how large the grey band is, conclusions can instantly be made. Great transparency effects can be achieved with a combination of grey-scale and emphasis. Unfortunately the CPU's clock rate much slower than the PPU, thus much precision cannot be easily achieved for effects. - Changing sprite or Background enable - Bits $2001.3 and $2001.4 can be changed any time to display or hide background and sprites. This takes effect immediately. As long as only background or only sprites are hidden, this can lead to interesting effects. However, if both bits are set to '0', the PPU will enter in the forced blanking mode. When it is necessary to mask the sprites and the background but forced blanking is not desirable, it is possible to swap CHR-ROM with filled solid colour in the pattern tables to simulate the effect. - Forced VBlank mode - It is possible to selectively use forced blanking as part of rendering frames, but this should be tested extensively on real hardware. Usually, only the background colour at $3f00 is displayed in this mode. There is an exception: When $2006 is pointing to a palette entry (in the range of $3f00 to $3fff), this colour is shown instead. By cleverly blanking areas of the screen while leaving other areas enabled, it is possible to do heavy pattern or name table updates in order to get some effects that would otherwise not be possible within the short VBlank of NTSC NES. It is also possible to rewrite the palette that way. Important : While in forced blank mode, the scrolling counters are not updated, the OAM DRAM is not refreshed, and the Palette DRAM is not refreshed either. Another important thing is that the PPU still continues to count lines, columns and executes VBlank interrupts normally in that mode. ------------------------------------- Mid-frame effects possible with $2002 ------------------------------------- $2002 is a read only register, so effects can't directly be done through it. However, reading this register is a key in doing mid frame effects. - VBlank flag - $2002.7 tells whenever the PPU entered in VBlank: This flag is set to '1' when a VBlank period starts (at line 240), it resets to '0' when the VBlank period stops and when the $2002 register is read. This flag is set at the same time a VBlank NMI fires if enabled via $2000.7. Reading $2002 at the exact time of a VBlank start can clear the flag return a '0' in bit 7, implying a VBlank was missed. For that reason, when frames should not be missed, $2002 shall not be polled when an VBlank is about to start, but instead NMI should be used. - Sprite zero hit - $2002.6 is a very interesting flag. This flag is always cleared on the start of a new frame (at the end of the VBlank), and is set as soon as a non-transparent pixel from the sprite zero graphically overlaps a non-transparent pixel of the playfield. Sprite zero is the sprite in lower addresses of sprite RAM. By placing this sprite at the right place, it is possible to detect '0' to '1' transitions and synchronize the CPU with PPU's rendering (see the paragraph over synchronization). At least one pixel should "hit" for this flag to become true. If no pixel graphically overlaps, the flag remains cleared during the entiere frame, and in some cases this will lead programs to crash, as they endlessly wait for this bit to raise. The overlaps in the last column of the screen (X=255) also do not count, as well as overlaps within the first 8 pixels, if left clipping is enabled via $2001. It's up to the imagination of anyone to hide the sprite zero so that it reliably hits every frame and does not get in the way of graphics. - Sprite overflow flag - $2002.5 is also interesting. This flag is cleared on the beginning of each frame, and set as soon as more than 8 sprites are met on the same line. When more than 8 sprites are supposed to be seen on the same line, the NES discards the lower priority sprites, and they won't be visible. When that happens, this flag is set, and once set it remains set until the end of the frame. This flag could be used to synchronize with the PPU. Unfortunately, this flag is not reliable and can trigger false positives, as well as not rise when it should. If any usage of this flag is done, this should be extensively tested on real hardware. - Using the 8 sprite per line limitation - Not really an effect with the $2002 register, but something interesting while we're on the subject of sprites. If sprites should be disabled on a range of lines, it is a good idea to use the 8 sprites per line limitation to purposely hide sprites, by placing 8 dummy high priority sprites. It is even possible to hide only some sprites this way, by having hidden sprites lower priority, and non-hidden sprites higher priority that the 8 dummy sprites. However, many people will play games/demoes with the sprite limit disabled on their emulators, as this limit is in most case more annoying than useful, thus it'd be a good idea to explain in the documentation that this limit must be turned on for that particular program. ------------------------------------------- Mid-frame effects possible with $2003/$2004 ------------------------------------------- If there is any PPU registers on the NES which are really obscure even many years after the console was first reverse-engineered, it's without a doubt $2003 and $2004. Interesting effects can likely not be done by writing to $2003 and $2004 during the frame, and it will probably have no effect other than add graphical glitches. $2004 can also be read. During VBlank this allows to read back what was written in OAM, but I really see no reason to do that. During the frame, the PPU will have to read sprite data from OAM, and while doing that, the data also mirrors to $2004. Unfortunately, the CPU is much slower than the PPU, and the chances of putting that to a good use are slim. It has been recently discovered that this could be used for synchronization issues, but the results depended on the model of the console (Famicom or NES) and revision of the PPU, leading to the conclusions it is no good idea to use this, since it is usually required that the program functions identically on all models of the console. When developing any tricks using $2003 or $2004, extensive testing on the real hardware, and on different revisions of consoles, must be done. ------------------------------------- Mid-frame effects possible with $2005 ------------------------------------- DISCLAIMER: The $2005 register is actually two registers, each write flips an internal latch that toggles which register is written to. This latch is cleared when $2002 is read. $2005/1 refers to odd writes, and $2005/2 refers to even writes. - Changing horizontal scrolling - Writing to $2005/1 midframe will simply change the horizontal scroll position. This can lead to many interesting effects. Writing to $2005/2 has no effect: $2005/2 is only useful in VBlank for setting the vertical scroll position. A dummy $2005/2 write has basically the same effect as a $2002 read. When combining $2005/1 with $2000.0, it is possible to change the scrolling to any horizontal position possible. - Unwanted effects - Writing to $2005/1 in the middle of a line can get undesirable glitches on the screen. In order to change the scrolling without any glitches, trial and error should be experimented by adding or removing some NOP instructions before the $2005 write, and the timing can be fine tuned until glitches no longer happens. ------------------------------------------- Mid-frame effects possible with $2006/$2007 ------------------------------------------- DISCLAIMER: The $2006 register is actually two registers, each write flips an internal latch that toggles which register is written to. This latch is cleared when $2002 is read. $2006/1 refers to odd writes, and $2006/2 refers to even writes. - Set scroll position to a fixed value - Most documents that talks about midframe $2006 writes say complicated things about it, because emulating is is complicated, but on the programmer's side it's really simple: Writing a 16-bit address to $2006 during the frame will simply make the corresponding tile at corresponding name table address show up on the left edge of the next line. This allow setting both vertical and horizontal scrolling, but with a resolution of 8 pixels. Example: If $21 then $40 are written to $2006, the tile at address $2140 will immediately be shown on the left edge of the next scanline. The corresponding scrolling values do not even have to be calulated, and this is really useful. From now on the tile resolution scroll will be called the coarse scroll, and the scrolling within tiles will be called the fine scroll. - Fine scrolling - The higher 2 bits allows an increase in the scrolling resolution. In the above example the addressed tile will be shown, but it's first 2 or 3 lines will be "eaten", because the higher 2 bits says '2'. If $01 then $40 would be written instead, the same tile would be shown, but will be shown from the top (possibly with one "eaten" line). The highest nybble of the adress is not useful for tile adressing as it is hardwired to $2xxx, thus it has be reused for vertical fine scrolling. Horizontal fine scrolling is unaffected by $2006 writes. Unfortunately, only fine scroll values of $0 to $3 are possible, $4 to $7 are not accessible. This probably comes form the fact only $0000-$3fff are addressable within the PPU, and $4000-$7fff are not, thus the 14th address bit physically does not exist. Depending on the exact timing, the fine scroll will be increased by 1 line: This happens if the write to $2006/2 is just before the PPU goes to the next line and increment a counter internally. As a consequence, there is also no way to hijack the pattern table data as name table data by writing an address that is between $0000 and $1fff, instead, a normal nametable will be used with a different fine scroll value (it's such a shame). However, it is possible to hijack attribute table data as name table data, by writing an address between $x3c0 and $x3ff. This has the same effect as using a vertical scroll value between $f0 and $ff in VBlank, the so called "minus" scrol values. The same bits will be used simultaneously as name table and attribute table data if this happens: good luck making a practical use of it. - Undesirable glitches - Writing to $2006 mid-scanline can get undesirable glitches on the screen. In order to change the scrolling without any glitches, trial and error should be experimented by adding or removing some NOPs instructions before $2006 writes, and fine tune the timing until a setup with no glitches is gotten. It is better to test with an acurate emulator (Nestopia, Nintendulator) or real hardware. Read the paragraph about detailed scanline timing for details. - Multidirectionnal scrolling - By writing to $2006/1, $2006/2 then $2005/1, the screen is set to a known position, and then set the desired horizontal scroll is applied. The coarse horizontal scroll is set twice, one through $2006 and once through $2005/1. I strongly recommend that both values matches in order to avoid glitches: Ignoring the desired horizontal scroll completely when writing to $2006 is possible, as the value will be fixed by the following $2005/1 write, but will typically result in visible glitches. - About $2007 - As far I know accessing $2007 during the frame is a bad idea and will only introduce graphical glitches. Forced blanking mode should be used for any writes to VRAM, or palette data. Reading $2007 could however be usable outside of forced blanking: The logical effect would be that it would cause the PPU to skip a tile ($2000.2 clear), or a row of tiles ($2000.2 set). Interesting effects could be achieved that way, similarly to the SNES "offset per tile" mode. However I am not sure if this works: Extensive on real hardware is necessary in all cases. ------------------------------ Bankswitching CHR-ROM midframe ------------------------------ If a mapper with CHR-ROM (or more rarely, CHR-RAM) switching is used, it's possible to change pattern tables midframe to get rid of the 256 tiles limit for both the playfield and the sprites, not at the expense of the others this time. In addition to have more tiles, similar but different patterns could be switched in order to get transparency effects. --------------------------- Changing mirroring midframe --------------------------- If a mapper with mirroring control is used, it's possible to change the mirroring midframe. The major use of this feature is to have the gameplay field in one nametable and the status bar in the other nametable, using 1-screen mirroring, but it could have other uses as well. ------------- Practical use ------------- While most of the mentioned effects are cool and can greatly extend the graphical capabilities of the NES, it is very impractical to do full screen effects in a real game, simply because it would take 100% of the CPU time, leaving no time to do any game logic. The other problem is that it is only possible to reliably synchronize the CPU with the PPU 3 times per frame under standard conditions : VBlank, end of VBlank, sprite zero hit. The two latter relies on the fact that there is a sprite zero hit, and are not suitable for all cases. Getting more than one split point per frame is usually done by writing timed code from one synchronized point, and while this can be fun at first, it can become tedious when the most precise timing is required. In order to circumvent those limitations IRQs may be used. The NES by itself only has 2 IRQ sources, DMC IRQ and APU frame IRQ, both are not synchronized with the PPU but with the APU, and both are mostly useless. APU frame IRQ will probably never ever be useful for graphics, because it triggers at a rate close to the VBlank NMI, but is unsynchronized with the PPU. DMC IRQ on the other hand can be useful to free CPU time: A silent DMC sample can be played to trigger an IRQ as it ends. There is a very large jitter window in which the IRQ might trigger (many lines), thus it can only be used as a tool to free CPU time, and can't be uses it for synchronizing by itself. The sample can be made to start during VBlank, and end just before a sprite zero hit. Inside the IRQ interrupt service routine, the sprite zero flag can be polled. Cartridge IRQs (from the mapper) are the most useful, but requires a mapper that supports them. With such a mapper, it's much easier to get interesting effects. -------------------------- Synchronizing with the PPU -------------------------- Synchronizing at VBlank : NMI ;Pointer at $fffa points here pha txa ;Save the CPU registers pha tya pha bit $2002 ;Acknowledge the interrupt ...... ;We are synchronized at scanline 240 (start of VBlank) here ...... pla tax pla rti ;End of NMI interupt service routine Synchronizing at scanline 0 by detecting a '1' to '0' transition: ;It is assumed the sprite zero hit flag is has been set on the previous frame - bit $2002 bvs - ;When the CPU goes outside of that loop, it is synchronized at scanline 0 Synchronizing at sprite zero hit by detecting a '0' to '1' transition: ;It is assumed the flag is already clear here - bit $2002 bvc - ;When the CPU exits the loops, we know we're at the location of the first pixel that makes the collision A bit / bvc loop takes 7 clock cycles to complete, the loop can exit with a 7 CPU clock cycles wide window. An IRQ and NMI will also can trigger in a 7 CPU clock cycle window, as the current instruction is always finished, instructions are up to 7 cycles long. ------------------ Writing timed code ------------------ If more than a single split point is used, it will be necessary to write timed code from a previous synchronization point. This will allow things like writing to registers each line for more effects. A few points have to be remembered when writing timed code: - Each instruction takes a certain amount of clock cycles, lookup tables exists for that purpose. There is a good detailed table available at http://6502.org/tutorials/6502opcodes.html A good practice is to write the cycles in comment, and to sum cycles of all instructions to know the total time a block takes - A NTSC line is exactly 113 + 2/3 clock cycles - A PAL line is exactly 106 + 9/16 clock cycles - A PAL instruction takes, relatively to the PPU, 16/15 of it's NTSC counterpart - If writing to registers every line, a loop whose iterations follow these timings should be used - If a DMC sample is playing, it will steal cycles form the CPU, and thus, have the undesirable effect of altering the timing significantly. DMC playing should be avoided when timed code is used. - To handle the fractional clock cycles within a loop - A convenient way of doing it is that in each iteration of the loop : lda var ;3 clc ;2 adc #$ab ;2 sta var ;3 bcs + ;This instruction takes in average almost exactly 2 + 2/3 clock cycles for NTSC timed code + .... lda var ;3 clc ;2 adc #$90 ;2 sta var ;3 bcs + ;This instruction takes in average exactly 2 + 9/16 clock cycles for PAL timed code + ..... This is not the only possibility to do this, but I find it's one of the most convenient. Feel free to come with your own ideas in your game/demo. - To adjust the starting point of the loop - Often after a synchronization point (such as a sprite zero hit or an IRQ), the code that handles midframe register writes shall not be called immediately, because this would seriously limit the control to when exactly the register write happens. There is no instruction that can take one single clock, but nop takes 2 and lda zeropage takes 3, by summing them it is possible to get any number. Examples of how to adjust a delay code: nop nop ;Takes 4 clocks, but too early? ..... nop nop nop ;Takes 6 clock, but too late? ..... lda $ff ;(dummy read, the value is not used) nop nop ;Takes 5 clock -> get more precision and make your conclusions ...... - Longer delays - For longer delays than just a few clocks there is no need for a long chain of nops. Instead a dummy loop is more compact: ldx #Constant - dex bne - The constant shall be adjusted with trial and error in order to produce the correct delay. Increasing the constant of 1 means 5 more clocks of delay. This dummy loop shall be followed or predeceased by the fine-tune technique described above. When converting from NTSC to PAL, multiply the constant by 15/16. When converting from PAL to NTSC, multiply the constant by 16/15. The fine tune should of course be fine-tuned manually in both cases if the exact time of register write matters. -------------------------- Low level scan line timing -------------------------- This complicated chapter is only for people who really want to understand the low level things, and for those who really can't get rid of graphical glitches in their raster effects despites the good techniques explained in the previous paragraphs. Like I already said above, the PPU renders lines of 256 visible pixels and 81 "shadow pixels". On each PPU cycle, a pixel is rendered. On NTSC NES, 1 CPU clock is 3 PPU cycles long. For a PAL NES, 5 CPU clocks are 16 PPU cycles long. A key to understand low level operation is to know how exactly the PPU chip fetches data from VRAM. I won't give any details as there is another document that describe all the details at http://nesdev.com/2A03%20technical%20reference.txt. However remembering the following can help: - The PPU fetches name table, attribute table and background pattern tables for current line during cycles 0-255 - The PPU fetches Sprite RAM and sprite's pattern table for the next line during cycles 256-322 - The PPU fetches name table data again the next line during cycles 322-341 How they decided how to number cycles is beyond me: Because this repeats over and over the "cycle 0" could have be anywhere. However I use the convention that is described in Nintendulator's debugger and Brad Taylor's document, even if I believe it would have made more sense to place "cycle 0" somewhere else (namely, on what they call "cycle 256", thus, all fetches would be for the current line, and things would be less confusing). When updating scrolling registers, there is a 66 cycles window where scrolling counters can freely be updated without glitches. The PPU fetches 36 tiles, but only 31-33 are actually visible, thus making the actual glitch free window larger. Another thing to understand is the jitter: Because a polling sync loop or IRQ firing time has a window (of typically 7 clocks) the time actual writes will randomly move within this window from frame to frame, this is called jitter. - "bit $2002/branch" loop and interrupts over random codes makes 7 clocks jitter (= 21 pixels NTSC, 24 pixels PAL) - interrupt over a "lda zeropage/branch" loop or a "jmp here" loop makes 3 cycles jitter (= 9 pixels NTSC, 12 pixels PAL) When it is necessary to reduce jitter, synchronize with the NMI interrupt is better than a sprite zero hit, by writing the idle loop accordingly. I don't know of any way to reduce jitter even further, but maybe there is one. On cycle 256, the PPU increments it's internal row counter, and the coarse horizontal scrolling is reloaded. There is basically two correct ways to update the scrolling via $2006: Either the new scroll is written always before cycle 256, or always after. If register $2006 is writen sometimes before and sometimes after (because of jitter), the graphics will be shaking vertically. Notes: - When writing before cycle 256, the fine vertical scroll is incremented immediately before being used. If the write is too early, glitches will appear on the right edge of the screen. This explains the glitches above Shadow Man's face in Mega Man 3 for example : They did a scrolling write way too early - When writing after cycle 256, the writen scroll value is directly used as-is. If $2005 is also written to, the coarse value (high 5 bits) do not take effect until the next line. If the write is too late, glitches will appear on the left edge of the screen. - Only the second $2006 write takes effect, the first write is just buffered internally, and can be done "too early" without causing any problem. Scroll changes done by $2000.0 and $2005 solely should always be done before cycle 256 to avoid glitches. --------------------------------- List of possible mid-line effects --------------------------------- It's possible to do multiple writes to the same registers during a single line to achieve even more advanced effects. The list of possible effect is however quite short, and all of them are hard to pull of because they need realy precise timing, and the typical jitter is at least ~12 pixels. As far I know only 2 emulators emulate this kind of effects properly (Nintendulator, Nestopia), and in all cases extensive testing on real hardware is a must in order to confirm the proper working of the effect. $2000.3 : The pattern table used for playfield can be changed mid-line in order to bypass the 256 tile limitation within the same line. A zone with at least 2 tiles which are are unaffected by the change is necessary for the effect to look good. As far as I know only the game "Marble Madness" did that (to display text messages over the background that already used all the pattern tables) $2001 : Grey-scale and colour emphasis bits can be switched any time (the left clipping too, but it's not very useful). As far as I know only the game "Final Fantasy" did that (when lighting an elemental orb). $2005 : The fine scrolling (low 3 bits) can be changed mid-line. A zone for at least 2 tiles with a solid colour within the change will be necessary for the effect to look good. As far I know this was never used intentionally. $2006 : The PPU can be forced to fetch a specific tile any time, but I do not believe this can be useful because of the jittering. If one could ever reduce the jittering to make it fall inside a single tile, it might become possible to do amazing effect with that. Other : It's possible to bankswitch CHR-ROM mid-line in a similar way that would be done with $2000.3. As far I know only the game "Mother (J)" did that (when opening the menu). Another possibility is to do constant bankswitches between BG and sprites fetches, for example to use 512 tiles for sprites (using 8x16 mode) and 256 more tiles for the playfield. (The MMC5 chip does it automatically.) This is possible on any mapper, but requires wasting 100% of CPU's time for all lines where this effect would be active, seriously limiting the usefulness. Other (bis) : For mappers that support selectable mirroring it is possible to switch which name table is displayed mid-line. Again a zone of at least 2 tiles which are not affected by the change is required for the effect to look goods. Such an effect will not work using $2000.0 or $2000.1. As far I know this effect was never used intentionally. ----------------------------------------- Warning about some difficult instructions ----------------------------------------- Some instruction takes a variable number of clocks when a page boundary is crossed (if the high byte of the destination address is changed). For example: ldx #$00 adc $6ff,X ;Takes 4 clock cycles inx adc $6ff,X ;Takes 5 clock cycles $efff: lda Var $f001: bmi $efff ;When branch is taken, takes 4 cycles instead of 3 Find a way to make sure such instructions do either never, or always, cross a page boundary, in order to accurately preict the timing. ---------- Conclusion ---------- I hope this document helped people that wanted to do a program that does some complicated graphics tricks on the NES but did not want to go through the headache of the un-necessary stuff. It covers most aspects of raster effects on the practical/coder's side and not from the emulator author's side. It explains how to exploit the effects, not what they are due to. I hope this have been useful to you. I'd like to thanks all active members of NESdev, as without them I wouldn't have the knowledge to write that, even less to understand the "headache documents" for the emu author. (I hope this one wasn't too much a headache for you). If you have any questions, please contact me via the Nesdev BBS on http://nesdev.parodius.com/bbs for any questions or comments.